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Abstract. Consider the task of recovering an unknown n-vector from phaseless linear measurements. This 
task is the phase retrieval problem. Through the technique of lifting, this nonconvex problem may be 
convexified into a semidefinite rank-one matrix recovery problem, known as PhaseLift. Under a linear number 
of exact Gaussian measurements, PhaseLift recovers the unknown vector exactly with high probability. 

Under noisy measurements, the solution to a variant of PhaseLift has error proportional to the t\ norm of 
the noise. In the present paper, we study the robustness of this variant of PhaseLift to a case with noise and 
gross, arbitrary corruptions. We prove that PhaseLift can tolerate a small, fixed fraction of gross errors, even 
in the highly underdetermined regime where there are only 0(n) measurements. The lifted phase retrieval 
problem can be viewed as a rank-one robust Principal Component Analysis (PCA) problem under generic 
rank-one measurements. From this perspective, the proposed convex program is simpler that the semidefinite 
version of the sparse-plus-low-rank formulation standard in the robust PCA literature. Specifically, the rank 
penalization through a trace term is unnecessary, and the resulting optimization program has no parameters 
that need to be chosen. The present work also achieves the information theoretically optimal scaling of 
0(n) measurements without the additional logarithmic factors that appear in existing general robust PCA 
results. 
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1. Introduction 

This paper establishes robustness of an algorithm for recovering a vector xo £ R” from phaseless lin¬ 
ear measurements that contain noise and a constant fraction of gross, arbitrary errors. That is, for fixed 
measurement vectors cq £ R™ for i = 1 ... m, our task is to find xq satisfying 

(1) bi = \(x 0 ,ai)\ 2 +Ei+r]i 

for known bi £ R, known cq, and unknown rji and e*. Here r/i will represent the noise in the measurements, and 
Ei will represent gross, arbitrary errors. This recovery problem is known as phase retrieval. Measurements 
of form m arise in several applications, such as X-ray crystallography, optics, and microscopy GOUM]. in 
such applications, extremely large errors in some measurements may be due to sensor failure, occlusions, or 
other effects. Ideally, recovery algorithms could provably tolerate a small number of such errors. 

Recently, researchers have introduced algorithms for the phase retrieval problem that have provable re¬ 
covery guarantees mm. The insight of these methods is that the phase retrieval problem can be convexified 
by lifting it to the space of matrices. That is, instead of searching for the vector Xq , one can search for the 
lifted matrix xoXq. The quadratic measurements ([l]) then become linear measurements on this lifted matrix. 
As the desired matrix is semidefinite and rank-one, one can write a rank minimization problem under the 
semidefinite and data constraints, which has a convex relaxation known as PhaseLift. In this noiseless case, 
PhaseLift is the program 

(2) mintr(X) subject to X ^ 0, {a-Acq = 6j}i=i... m 
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Here, the trace of X is a convex proxy for the rank of a positive semidefinite X. An estimate for the 
underlying signal xo can be computed by the leading eigenvector of the optimizer of ©. 

As in mm we will seek recovery guarantees for independent identically distributed Gaussians 

a» ~ A/'(0,/ n ). 

Under this data model, If] and [2] have shown that © can be simplified to the semidefinite feasibility 
problem 

find X ^ 0 such that {a\Xai = 6j}i=i... TO 

This feasibility problem succeeds at finding xqXq exactly with high probability when m > cn for a sufficiently 
large c [2]. This scaling is quite surprising because there are only 0(n) measurements for an 0(n 2 ) dimen¬ 
sional object. As discussed in [7], the semidefinite cone is sufficiently ‘pointy’ that the high-dimensional 
affine space of data-consistent matrices intersects the semidefinite cone only at exactly one point. 

In the noisy case without gross erros, that is for e = 0, [2] showed that the PhaseLift variant 

(3) min \a\Xai — bf\ subject to X y 0 

i 

successfully recovers a matrix near xoXq with high probability. Specifically, they prove that the solution X 
to © satisfies ||X — x 0 Xq||f < ColM|i/ m with high probability. From X, an estimate of xq can be obtained 

by 



where (Ai, ui) is the leading eigenvector and eigenvalue pair for X. In [2, the authors prove that |£o — ±xo| < 
Comin(||xo||, ||r?||i/m||x 0 ||) for some C 0 . 

The contribution of the present paper is to show that the program © is additionally robust against a 
constant fraction of arbitrary errors. For a fixed set of coefficients that contain gross errors, we show that 
approximate recovery succeeds with high probability for arbitrary signals and arbitrary values of the gross 
errors. 

Theorem 1. There exist positive numbers f m im 7, c, C, C' such that the following holds. Let m > cn. Fix 
a set S C {1 .. .m} such that \S\/m < f m in■ On an event of probability at least 1 — e~ irn , for any xo € 1" 
and for any e with supp(e) C S, the minimizer X to © satisfies 

\\X-xoxlW F <C^. 

m 

The resulting estimate for Xq satisfies 

||x 0 -±X 0 || < C' min ("||x 0 ||, 

Note that this high-probability result is universal over x$ and e and does not only apply for merely for a 
fixed signal or for a fixed error vector e. 

In the case of no gross errors, in which e = 0, this theorem reduces to the result in [2] mentioned above. 
In the noiseless case, in which rj = 0, the theorem guarantees exact recovery of xo with high probability 
under a a linear number of measurements, of which a constant fraction are corrupted. 

We now explore the optimality of this theorem. The scaling of m versus n is information theoretically 
optimal and has no unnecessary logarithmic factors. The noise scaling is the same as in [2], and its optimality 
was established there. For arbitrary errors, the fixed fraction of gross errors can not be extended to a case 
where / m in >1/2 because one could build a problem where half of the measurements are due to an Xq and 
the other half are due to some x\. In such a case, recovery would be impossible. 

1.1. Relation to Robust PCA. Much recent work in matrix completion has studied the recovery of low- 
rank matrices from arbitrary corruptions to its entries, known as robust Principal Component Analysis 
(PCA). Results in this framework typically involve measuring some of the entries of a low rank n x n matrix 
X and assuming that some fraction of those measurements are arbitrarily corrupted, giving the data matrix 
A. The matrix X can then be recovered under certain conditions by a sparse-plus-low-rank convex program: 

(4) minA||X||* + ||A||i such that V(X + E) = V{A) 
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where ||X||* is the nuclear norm of X, A is a constant, ||-E||i is the norm of the vectorization of E , and V is 
the projection of a matrix onto the observed entries. [10 EB [EEll 13 HU- Results from this formulation have 
been quite surprising. For example, under an appropriate choice of A and under an incoherence assumption, 
the sparse-plus-low-rank decomposition succeeds for sufficiently low rank X when 0(n 2 ) entries are measured 
and a small fraction of them have arbitrary errors 0 • Subsequent results have been proved that only require 
m > rn polylog(n) measurements, where r is the rank of X [6] [12]. This result is information theoretically 
optimal except for the poly logarithmic factor. 

The present paper can be viewed as a rank-one semidefinite robust PCA problem under generic rank-one 
measurements. From this perspective, we would naturally formulate the PhaseLift problem under gross 
errors by 

(5) min Atr(X) + | a\Xai — bi\ subject to X > 0. 

i 

The present paper shows that explicit rank penalization by the trace term is not fundamental for exact 
recovery in the presence of arbitrary errors. That is, 0 can be simplified by taking A = 0. The resulting 
program has no free parameters that need to be explicitly tuned. As in EH, the positive semidefinite 
cone provides enough of a constraint to enforce low-rankness. The present paper also shows that rank-one 
matrix completion can succeed under an information theoretically optimal data scaling. Specifically, the 
extra logarithmic factors from low-rank matrix completion and robust PCA do not appear in Theorem [l] 
The present work also differs from the standard robust PCA literature in that the measurements are generic 
and are not direct samples of the entries of the unknown matrix. 


1.2. Numerical simulation. We now explore the empirical performance of (0 by numerical simulation. 
Let the signal length n vary from 5 to 50, and let the number of measurements m vary from 10 to 250. Let 
Xq = ei. For each ( n,m ), we consider measurements such that 

^ f bi ~ Uniform([0,10 4 ]) if 1 < i < [0.05m], 

1 bi = | (mo, a,}| 2 otherwise. 

We attempt to recover XqXq by solving 0 using the SDPT3 solver |T5jjT6] and YALMIP [13]. For a given 
optimizer X , define the capped relative error as 

min(||A - cc 0 aio|| f/|| m 0 a;o|| f, 1). 

Figure [T] plots the average capped relative error over 10 independent trials. It provides empirical evidence 
that the matrix recovery problem 0 succeeds under a linear number of measurements, even when a constant 
fraction of them contain very large errors. 
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Figure 1. Recovery error for the PhaseLift matrix recovery problem 0 as a function of n 
and m, when 5% of measurements contain large errors. Black represents an average recovery 
error of 100%. White represents zero average recovery error. Each block corresponds to the 
average from 10 independent trials. The solid curve depicts when the number of measure¬ 
ments equals the number of degrees of freedom in a symmetric nxn matrix. The number of 
measurements required for successful recovery appears to be linear in n , even with a small 
fraction of large errors. 
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2. Proofs 

Let A : S n —> M m be defined by the mapping * i-a (a-*cii)j = where S n is the space of symmetric 
real-valued nx n matrices. Note that A*X = ]Tb ALet As be the restriction of A onto the coefficients 
given by the set S. Let ej be the ith standard basis vector. Let *o = XoXq. We can write the measurements 

m as 

b = AX o 6 TJ. 

Similarly, the optimization program (J3J can be written as 

min ||A* — 6||i such that X Y 0. 

We introduce the following notation. Let ||*||i be the nuclear norm of the matrix X. When X ^ 0, 
||*||i = tr(*). Denote the Frobenius and spectral norms of X as ||*||f and ||*||, respectively. Given xo , 
let T Xo = {yxo + Xo\ y S R n }. Note that T ei is the space of symmetric matrices supported on their first 
row and column. The orthogonal complement T ^ is then the space of matrices supported in the lower-right 
n-lxn-1 block. When Xq is clear, we will simply write T instead of T Xo . Let / be the identity matrix, 
and let 1(F1) be the indicator function of the event E. 


2.1. Recovery by dual certificates. The proof of Theorem [T| will be based on dual certificates, as in 
Huai. A dual certificate is an optimal variable for the dual problem to <[3|). Its existence certifies the 
correctness of a minimizer to ([3|. 

The first order optimality conditions at Xq for m are given by 


(7) 

Y = A*\ 

(8) 

AeS||-||i(-£) 

(9) 

Y Y 0 

(10) 

(y,* 0 ) = o 


where 9|| • ||i(—e) is the subgradient of the norm evaluated at —e. Note that <[9|) and (flTil) imply Yp = 0. 
Such a Y would be dual certificate for ([3]). Unfortunately, constructing such a Y that exactly satisfies these 
conditions is difficult. As in man, we seek an inexact dual certificate, which approximately satisfies these 
conditions. Specifically, we will build a dual certificate Y = A* A that satisfies 


( 11 ) 

( 12 ) 

(13) 


I'J'A. ^ I'J'A. 

||Lt||f < 1/2 

= ^m S S n ( ei ) if £i 7^ 0 

\a,<^ if Si = 0. 


To prove that existence of such a Y will guarantee successful recovery of xqXq with high probability, we 
will rely on two technical lemmas. The first technical lemma provides £i-isometry bounds on A and was 
proven in [4] . 

Lemma 1 ([4]). There exist constants Cq,7o such that if m> c^n, then with probability at least 1 — e _7om , 


(14) -MPOII, < (l + 11*11! for all X, 

m \ 16/ 

(15) —||A(X)||i > 0.94 ( 1 — — ) ||*|| for all symmetric, rank-2 X 


We will need simultaneous control of the £i-isometry properties over several subsets of measurements. 
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Lemma 2. There exists a constant 70 such that the following holds. Let m > lOOcoro, and fix a support set 
S with |S'| = [0.01m]. There is an event E$ with probability at least 1 — e~ lom on which 


(16) 

(17) 

(18) 


1 

|S“ 


>0.94 1- 


X|| for all symmetric rank-2 X, 


|S| 


Msxiu < 1 


-Mx||i< 1 + 


1 

16 

1 

16 


1 

16 

\\X\\i for all X, 
||X||i for all X. 


The proof of Lemma [2] is immediate from Lemma 03 

In order to prove that a dual certificate guarantees recovery, we establish a technical result that an optimal 
solution X 0 + H to ([3]) lies near the cone || H T ± ||i < |||i?r||F with high probability. This property is a strong 
version of injectivity on T. 

Lemma 3. Fix a support set S with IS] = [0.01m] and let m > 100con. On the event Es from Lemma [H 
for all x and for all e with supp{e) C S, any optimal Xo + H satisfies ||7J t _l||i > 0.56||77t||f — ^■|| 7 ?||i- 

Proof. By assumption, \\AH — e — rj\\i < ||e + ? 7 ||i. By the additivity of the t\ norm over vectors with disjoint 
supports, 

(19) \\A S cH-ris4i + \\A S H - e - r]s\\i < ||e + ??s||i + \\vs4i- 
By the triangle inequality, we have 

(20) \\A S cH- 77sr=||i + ||e + Us||i - \\A S H ||i < ||e + r?s||i + ||?7s=||i, 
which implies 

(21) < 2||t7s<=||i + Ms^||i 

Breaking m into its components on T and T 1 - and applying the triangle inequality, we have 

(22) \\As<=Ht\\i - \\As<=H t ±\\i < 2||t7s=||i + ||.4s.Ht||i + \\AsH T ± || 1 

(23) ^\\AscH t \\i < 2 ||? 7 sc||i + \\AsH T ||i + ||^li7 T x||i 

We now apply the t\ isometry bounds from Lemma [Q on each term of (l23l) . On the event Es, 

1 


(24) 


(25) 


\\A S cH t \\i > 0.94 1- 


16 


|S c |||i?T|| 


1 \ |S C 




where the second inequality follows because Ht has rank at most 2. On the event Es, 

1 


(26) 

(27) 

On the event Es, 

(28) 

Combining 
(29) 


\\AsH t ||i < |S| ^1 + 

<| S | (1 + 


||^liJ T j_||i < m ( 1 + 


16 


II#t||i 

v^HTJtIIf 

\\H t ± 111 


we have 

( 0-94 (1 ~ &) |S C | 

\V2(! + ^) m 


||JT t || f <-11^11, + 11^x11! 

m I ~~ 


m 


Thus, 0.56||i7 T ||F < |HM|i + \\ h t±\\i on the event E s . 


□ 


We may now prove that existence of an inexact dual certificate (ED ED will guarantee successful recovery 
of a matrix near Xq = XqXq with high probability, provided that there are few enough arbitrary errors. 
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Lemma 4. There exists a C such that the following holds. Fix S such that IS 1 ) = [0.01m]. Let m > lOOcon. 
Fix Xo £ K™. Fix e £ K. m such that supp(e) C S. Suppose that there exists Y = A* A satisfying (fllT) (fl3l) . 
Then, on the event E$ from Lemma HI a minimizer X of ^ satisfies ||X — Ao||_f < 

Proof. Let X = X 0 + H be a minimizer for ([3]), which implies that ||-4(X 0 + H) — b\\i < ||„4(Ao) — 6||i- That 
is 

\\AH-£-p\\i < ||e + Tylli- 

Letting a = 7 /m, condition (fl3l) gives 

(30) A/a £ a|| • ||i(-£) 

Hence, 

(31) ||- £ ||i 

(32) =► 

(33) =* 

(34) =► 

Decomposing (l34l) into T and T - 1 , we have 

(35) (Yt,Ht) < —(Y t ±,H t ±) + 2a||7?|| 1 
As Y t ± y 0 and H T ± y 0, we have 

(36) <Y T x,tf r x> < \{Y T ,H T )\ + 2a\\r 1 \\ 1 
By conditions (fTH (fl2ll 


{A/a,AH -rf) < ||e||i + ||? 7 || 
(A, AH - rf) < a||r?||i 

(Y,H) < (A ,rj) +a\\ 
(Y,H)<2a\\ri\\ 1 


(37) 


I H t ± 11 1 < \{Y t ±,H t ±}\ < \(Y T ,H T )\ + 2a\\r 1 \\ 1 < l\\H T \\ F + 2a\\r ] \\ 1 


By Lemma [3] on the event Es, 

(38) 

Combining (l37l) and 

(39) 

So, 

(40) 


I H 


T xni>0.56||.Hr||F--| 

m 


and using a = 7/m, we get 

0.56||77 t ||f < -|| 


+ 0.5||i7 T ||F 


-h 

m 
(j/r 

-Ht||f < — IMIi and thus ll-ffr-LII f < —| 

TO TO 


C' 


We conclude IIA — Anllp = II-HHf < 


|i for some C. 


□ 


2.2. Construction of the dual certificate. We now construct the dual certificate for arbitrary xq. Our 
construction will be a modification to the dual certificate in [2]. Also similar to [2j, we will build dual 
certificates with high probability on a net of Xq- We will then use a continuity argument to get a dual 
certificate for a arbitrary Xq. 

Let S + and S~ be disjoint supersets of the indices over which e is positive or negative, respectively. Let 
S = S + U S~. For pedagogical purposes, S + and S~ should be thought of as exactly the indices over which 
e is positive or negative. For technical reasons, we let them be supersets of cardinality linear in n, in order 
to use standard probability bounds. For a fixed choice of S + and S - , let the inexact dual certificate Y be 
defined by 

1 


(41) 


Y = 


E [A> - IK t^>I 2 i(K /Kl ^ 

K 11*0 II 11*0 II 


■ [ E ~ 7a i a i + E 7aiQ 

ies+ ies- ies c 

where /3q = Ez 4 l(|,z| < 3) « 2.6728 and z is a standard normal random variable. We will refer to each of 
the terms of in the right hand side of (14T1) as Y+,Y-, and Yq , respectively. 

The form of Yq is due to [2], and the intuition behind it is as follows. Note that E(cua*) = I n and 


E(|(aj, ei)| 2 aja|) = 


A) 

o 


0 

In— 1 


, where /3q = Ez 4 for a standard normal z. The construction A [T)™ J/3 0 — 
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|(aj, ei)| 2 ]aid‘ thus has expected value (/3 q — 1 )It j -> which would provide the exact dual certificate conditions 
© m- As shown in [2], a satisfactory inexact dual certificate can be built with m = 0(n ) and coefficients 
that are truncated to be no larger than 7/m. In the present formulation, the terms Y + and Y_ are then set 
to have coefficients =f7/to in order to satisfy (fl3l) . 

We now show that for a fixed signal and a fixed pattern of signs of e, that a dual certificate exists with 
high probability. 


Lemma 5. There exists constants c, 7 * such that the following holds. Let m > cn. Fix Xq £ R n and e £ R. m . 
Let S + and S~ be fixed disjoint sets of cardinality [0.001m]. Then the dual certificate Y from (14111 satisfies 
(fl3ll . ||Yt||f < 1/4, and \\Y T ±. — < yg with probability at least 1 — e -7 m . 


Proof. Without loss of generality, it suffices to assume xq = e±. It suffices to show that with high probability 


(42) 

(43) 

(44) 

(45) 


17 

^ O.T- 1 - Yq I t± 


< 0.15, 


Po.t||f < —, 
\\Y±,t± II <0.015, 
P±,t||f < 0.035. 


First, we establish (1421) C3l) . By Lemma 2.3 in [2], there exist c ,7 such that if |iS c | > cn, then with 
probability at least 1 — e _ 7 l s °l, 


(46) 

(47) 
Thus, 


ZLy 0Ti _iZ/ Ti 

\ S C\ 0,T 10 T 




rYn 


0 ,T 


< 1 / 10 , and 

< 3/20. 


Y 17 r 

r 0 ,T ± — Yq 1t1 


< 


y \S C I 17 

lo - T " " ~^W T± 


+ 1 - 


|S C |\ 17 


m 7 10 


< 0.15 


which establishes (1421) . By (l47l) . we get (1431) immediately. 

Next, we establish 63). Let a' be the vector formed by the last n — 1 components of a. Observe that 
a[ai* is a Wishart random matrix. Standard estimates for singular values of random matrices with 
Gaussian i.i.d. entries, such as Corollary 5.35 in mi, apply. If | 6 ' + | = [0.001m] > c^n, with probability at 
least 1 — e _7im for some 71 , 


1 \ ^ /* T 

— 2^ OiOi - 
1 1 ies+ 


< 1/2 


Hence, 


1 

W\ 


E 

ies+ 


< 3/2 


Thus, ||Y +iT _l|| < |—!■ 5 anc j we arrive at (PHI) . The bound for the Y_ T ± term is identical. 

Next, we establish fi5l) . Note that Y + = — 7^y- ■ J2ies+ a i a \- As P er LemmaO if |5 + | = [0.001m] > 

Cin, then ||1 +,t||f < 7-^t • 5 < 7 • 0.001 • 5, with probability at least 1 — e -7im . 

Thus 63) 63 simultaneously with probability at least 1 — e -7 m for some 7 * provided that c > 
max( 2 c, 1000 co, 1000 ci). 

□ 


The behavior of Y±,r relies on the following probability estimate for the behavior of a Gaussian Wishart 
matrix on T. 

Lemma 6 . Let xq = e\. Let A = A a i a \■ There exists £ 1,71 such that if m > c.\n then ||At||_f < 5 
with probability at least 1 — e _7im . 
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Proof. Let y be the (1,1) entry of A, and let y' be the rest of the first column of A. Then ||Ay||| = y 2 + ||y , |||. 
So, y = 77 a ?(l)- Hence, my ~ Xm- Standard results on the concentration of chi-squared variables, 

such as Lemma 1 in m , give 

IP(my > 4m) < 

for some 7 ^ 2 - Hence P (y 2 > 16) < e~ 71 ' 2m . 

Now, it remains to bound ||y'|||. We can write y' = AZ'c , where Z' = [a],. .., a' m ] and Ci = where 
Z' and c are independent. Note that ||c||| is a Chi-squared random variable with m degrees of freedom. 
Hence, with probability at least 1 — 

Ml < 4m 

For fixed ||cc ||2 = 1, H^'xlH ~ Xn- 1 , and hence with probability at least 1 — e _7l - 3m 

\\Z'x\\l < m 

when to > cin. Hence, to 2 || 2 /'||| < 4to • to with probability at least 1 — 2e -7l ’ 4m . So, ||j /'||2 < 2 with 
probability at least 1 — 2 e _7l ' 4 ”\ 

So ||-Ay||p < 25, and hence ||Ht||f < 5 with probability at least 1 — e _7im for some 71 . 

□ 


We now show that for a fixed signal and support set of gross errors, there is a high probability that a dual 
certificate exists simultaneously for all gross errors. 

Lemma 7. Fix Xq and a support set S. If m > cn and \S\/m < min(0.001,7*/2 log2), then there is 
an event Eg x0 on which for all e with supp(e ) C S, there exists a Y satisfying (1131) . ||Ft||f < 1/4, and 
II Ht-l - pj I T ± || ^ 3/10. The probability of Eq x ^ eit least 1 e 7 m / 2 . 

Proof. Consider all of the 2l s l possible assignments of sign to the entries of e on S. For each, choose an S + 
and S~ that are disjoint, have cardinality [0.001m], and are supersets of the indices assigned a positive or 
negative sign, respectively. Let Es,x 0 be the event on which all sign assignments yield a Y satisfying m, 
||Lt||f < 1/4, and ||Pr^ ~ IB^t^II < 3/10. By Lemma[5l this event has probability at least 

l-2 |s| e“ 7 * > l-e" 7 * m/2 . 


□ 


We now show that for a fixed support set of gross errors, there is a high probability that a dual certificate 
exists simultaneously for all signals and for all gross errors. 

Lemma 8. Fix a support set S. Ifm> max(c, 41og(201)/7*)n and |<S|/m < min(0.001,7*/2 log2), then on 
an event of probability at least 1 — e -7 m / 4 ; for all xq and for all e with suppie) C S, there exists Y = A*\ 
satisfying (1131) with a = 7 /to and ||Ft||f < 0.44 and ||Ft-l — -H/yrH < 4/10. 

Proof. By Lemma [3 for any fixed all xq such that ||aio|| = 1, for all e with supp(e) C S, there exists a 
Y = A* \ such that 

(48) HAHoo < 1 ~ 

7 

(49) As = — sgn(ss) 

TO 

(50) ||r T x + 1.7/ t _l|| < 0.3 

(51) ||Ft||f < 0.25 

on the event Es,x 0 , which has probability at least 1 — e -7 m / 2 . By Lemma 5.2 in m, there exists a net Af s 
such that |A/C| < (1 + 2/e) n . Hence, such a Y exists simultaneously for all xq € A4 on an event of probability 
at least 1 — {l + 2/e) n e~ 1 m l 2 . If m > 4nlog(l + 2 /e)/ 7 *, then such a Y exists simultaneously for all xq £ A4 
with probability at least 1 — e -7 C71 / 4 > 1 — e -7 m A. 

We now appeal to a continuity argument to show that a dual certificate exists for points not on the net 
M c . For an arbitrary x such that ||cc ||2 = 1, we consider the Y corresponding to the nearest € M e . Note 
that ||x — Too|| < £ by definition of the net M e . We now closely follow the proof and notation of Corollary 2.4 
in [2] to show that Y is a satisfactory approximate dual certificate for x. 
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Note that [|Y|| < 2.5. Let A = xx t — XoXq and note that ||A||f < 2e. Let T = T x . Now we have 

(52) Y t i_ + 1.7 I T x = Y t _ l + 1.7 I T ± - Ri 

*0 '0 

(53) Y t = Y Txq + R 2 
where 

(54) R ± = AY (I - xoxg ) + (I - xoXq)YA - AY A + 1.7A 

(55) R 2 = AY {I - xqxI) + (/ - x 0 a%)YA - AY A 
We observe that 


(56) ll-Rill < 2||Y||||A||||/ — xqXqW + ||Y||||A|| 2 + 1.7||A||< 13.4e + 10e 2 

(57) \\R 2 \\ < V2\\R 2 \\ < ^(2||Y||||A||||/-^II + mil|A|| 2 ) < 10V2(e + £ 2 ) 
If we choose e = 0.01, ||i?i|| < 0.135 and ||i? 2 ||F < 0.143. and 


(58) || Y T ± + 1.7I T ± || < 0.3 + 0.135 = 0.435 

(59) ||Y t ||f < 0.25 + 0.143 = 0.393 


□ 


We can now prove Theorem [Q 

Proof of Theorem{J\ Assume that \S\/m < min( 0 . 001 , 7 */ 21 og 2 ) and m > max(c,41og(201)/7*). By 
Lemma [51 there is an event of probability at least 1 — e -7 m / 4 such that for all xq and for all e with 
supp(e) C S, there exists a Y = A*X satisfying (fill) (fl3l) . Choose a superset S such that |S'! = 0.01m. On 
the intersection of this event with E-g : Lemma [4] guarantees that ||A — Xo||f < C^rf^\/m. The intersection 
of these events has probability at least 1 — e~ im for some 7 . 

The proof of the error estimate ||£ — *o|| < C' min ^||cc 0 ||, ^ can be found in [¥j. □ 
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